Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 391 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 30.7 KiB |
| Average record size in memory | 80.3 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 1 |
Pregnancies is highly correlated with Age | High correlation |
Glucose is highly correlated with Insulin | High correlation |
Insulin is highly correlated with Glucose | High correlation |
Age is highly correlated with Pregnancies | High correlation |
Pregnancies is highly correlated with Age | High correlation |
Glucose is highly correlated with Insulin | High correlation |
SkinThickness is highly correlated with BMI | High correlation |
Insulin is highly correlated with Glucose | High correlation |
BMI is highly correlated with SkinThickness | High correlation |
Age is highly correlated with Pregnancies | High correlation |
Age is highly correlated with Outcome and 1 other fields | High correlation |
SkinThickness is highly correlated with BMI | High correlation |
Outcome is highly correlated with Age and 1 other fields | High correlation |
Insulin is highly correlated with Glucose | High correlation |
BMI is highly correlated with SkinThickness | High correlation |
Glucose is highly correlated with Outcome and 1 other fields | High correlation |
Pregnancies is highly correlated with Age | High correlation |
df_index has unique values | Unique |
Pregnancies has 56 (14.3%) zeros | Zeros |
Reproduction
| Analysis started | 2021-09-01 01:15:17.265999 |
|---|---|
| Analysis finished | 2021-09-01 01:15:26.239070 |
| Duration | 8.97 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 391 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 387.5473146 |
| Minimum | 3 |
|---|---|
| Maximum | 765 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 46.5 |
| Q1 | 203.5 |
| median | 385 |
| Q3 | 567.5 |
| 95-th percentile | 722.5 |
| Maximum | 765 |
| Range | 762 |
| Interquartile range (IQR) | 364 |
Descriptive statistics
| Standard deviation | 216.102545 |
|---|---|
| Coefficient of variation (CV) | 0.5576159009 |
| Kurtosis | -1.150114152 |
| Mean | 387.5473146 |
| Median Absolute Deviation (MAD) | 182 |
| Skewness | -0.02494131874 |
| Sum | 151531 |
| Variance | 46700.30994 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 511 | 1 | 0.3% |
| 177 | 1 | 0.3% |
| 153 | 1 | 0.3% |
| 156 | 1 | 0.3% |
| 157 | 1 | 0.3% |
| 158 | 1 | 0.3% |
| 159 | 1 | 0.3% |
| 672 | 1 | 0.3% |
| 161 | 1 | 0.3% |
| 162 | 1 | 0.3% |
| Other values (381) | 381 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 13 | 1 | |
| 14 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 19 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 765 | 1 | |
| 763 | 1 | |
| 760 | 1 | |
| 755 | 1 | |
| 753 | 1 | |
| 751 | 1 | |
| 748 | 1 | |
| 747 | 1 | |
| 745 | 1 | |
| 744 | 1 |
| Distinct | 17 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.306905371 |
| Minimum | 0 |
|---|---|
| Maximum | 17 |
| Zeros | 56 |
| Zeros (%) | 14.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 5 |
| 95-th percentile | 10 |
| Maximum | 17 |
| Range | 17 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 3.213421914 |
|---|---|
| Coefficient of variation (CV) | 0.9717308341 |
| Kurtosis | 1.476228718 |
| Mean | 3.306905371 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.331966876 |
| Sum | 1293 |
| Variance | 10.3260804 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=17)
| Value | Count | Frequency (%) |
| 1 | 92 | |
| 2 | 64 | |
| 0 | 56 | |
| 3 | 45 | |
| 4 | 27 | 6.9% |
| 5 | 21 | 5.4% |
| 7 | 20 | 5.1% |
| 6 | 19 | 4.9% |
| 8 | 14 | 3.6% |
| 9 | 11 | 2.8% |
| Other values (7) | 22 | 5.6% |
| Value | Count | Frequency (%) |
| 0 | 56 | |
| 1 | 92 | |
| 2 | 64 | |
| 3 | 45 | |
| 4 | 27 | 6.9% |
| 5 | 21 | 5.4% |
| 6 | 19 | 4.9% |
| 7 | 20 | 5.1% |
| 8 | 14 | 3.6% |
| 9 | 11 | 2.8% |
| Value | Count | Frequency (%) |
| 17 | 1 | 0.3% |
| 15 | 1 | 0.3% |
| 14 | 1 | 0.3% |
| 13 | 3 | 0.8% |
| 12 | 5 | 1.3% |
| 11 | 5 | 1.3% |
| 10 | 6 | 1.5% |
| 9 | 11 | |
| 8 | 14 | |
| 7 | 20 |
| Distinct | 117 |
|---|---|
| Distinct (%) | 29.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 122.6956522 |
| Minimum | 56 |
|---|---|
| Maximum | 198 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 56 |
|---|---|
| 5-th percentile | 81 |
| Q1 | 99 |
| median | 119 |
| Q3 | 143 |
| 95-th percentile | 181 |
| Maximum | 198 |
| Range | 142 |
| Interquartile range (IQR) | 44 |
Descriptive statistics
| Standard deviation | 30.87081364 |
|---|---|
| Coefficient of variation (CV) | 0.2516047887 |
| Kurtosis | -0.4860331345 |
| Mean | 122.6956522 |
| Median Absolute Deviation (MAD) | 21 |
| Skewness | 0.5136803871 |
| Sum | 47974 |
| Variance | 953.0071349 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 14 | 3.6% |
| 99 | 10 | 2.6% |
| 129 | 9 | 2.3% |
| 95 | 8 | 2.0% |
| 88 | 8 | 2.0% |
| 128 | 7 | 1.8% |
| 126 | 7 | 1.8% |
| 109 | 7 | 1.8% |
| 117 | 7 | 1.8% |
| 84 | 6 | 1.5% |
| Other values (107) | 308 |
| Value | Count | Frequency (%) |
| 56 | 1 | 0.3% |
| 68 | 3 | |
| 71 | 2 | 0.5% |
| 74 | 3 | |
| 75 | 1 | 0.3% |
| 77 | 2 | 0.5% |
| 78 | 2 | 0.5% |
| 79 | 2 | 0.5% |
| 80 | 2 | 0.5% |
| 81 | 5 |
| Value | Count | Frequency (%) |
| 198 | 1 | 0.3% |
| 197 | 2 | |
| 196 | 2 | |
| 195 | 1 | 0.3% |
| 193 | 1 | 0.3% |
| 191 | 1 | 0.3% |
| 189 | 2 | |
| 188 | 1 | 0.3% |
| 187 | 4 | |
| 186 | 1 | 0.3% |
BloodPressure
Real number (ℝ≥0)
| Distinct | 37 |
|---|---|
| Distinct (%) | 9.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 70.68030691 |
| Minimum | 24 |
|---|---|
| Maximum | 110 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 24 |
|---|---|
| 5-th percentile | 50 |
| Q1 | 62 |
| median | 70 |
| Q3 | 78 |
| 95-th percentile | 90 |
| Maximum | 110 |
| Range | 86 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 12.50754012 |
|---|---|
| Coefficient of variation (CV) | 0.1769593352 |
| Kurtosis | 0.7915695858 |
| Mean | 70.68030691 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -0.09122008091 |
| Sum | 27636 |
| Variance | 156.4385599 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=37)
| Value | Count | Frequency (%) |
| 70 | 31 | 7.9% |
| 74 | 30 | 7.7% |
| 64 | 26 | 6.6% |
| 68 | 24 | 6.1% |
| 72 | 23 | 5.9% |
| 78 | 23 | 5.9% |
| 76 | 20 | 5.1% |
| 60 | 20 | 5.1% |
| 62 | 19 | 4.9% |
| 58 | 18 | 4.6% |
| Other values (27) | 157 |
| Value | Count | Frequency (%) |
| 24 | 1 | 0.3% |
| 30 | 2 | 0.5% |
| 38 | 1 | 0.3% |
| 40 | 1 | 0.3% |
| 44 | 3 | 0.8% |
| 46 | 2 | 0.5% |
| 48 | 3 | 0.8% |
| 50 | 10 | |
| 52 | 6 | |
| 54 | 8 |
| Value | Count | Frequency (%) |
| 110 | 2 | 0.5% |
| 106 | 2 | 0.5% |
| 102 | 1 | 0.3% |
| 100 | 2 | 0.5% |
| 98 | 1 | 0.3% |
| 94 | 2 | 0.5% |
| 92 | 1 | 0.3% |
| 90 | 11 | |
| 88 | 15 | |
| 86 | 11 |
| Distinct | 48 |
|---|---|
| Distinct (%) | 12.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.15089514 |
| Minimum | 7 |
|---|---|
| Maximum | 63 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 13 |
| Q1 | 21 |
| median | 29 |
| Q3 | 37 |
| 95-th percentile | 46.5 |
| Maximum | 63 |
| Range | 56 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 10.52933596 |
|---|---|
| Coefficient of variation (CV) | 0.3612011197 |
| Kurtosis | -0.4641133551 |
| Mean | 29.15089514 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.2075296064 |
| Sum | 11398 |
| Variance | 110.8669159 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=48)
| Value | Count | Frequency (%) |
| 32 | 20 | 5.1% |
| 30 | 18 | 4.6% |
| 33 | 17 | 4.3% |
| 23 | 16 | 4.1% |
| 18 | 16 | 4.1% |
| 29 | 14 | 3.6% |
| 26 | 14 | 3.6% |
| 28 | 13 | 3.3% |
| 27 | 13 | 3.3% |
| 25 | 12 | 3.1% |
| Other values (38) | 238 |
| Value | Count | Frequency (%) |
| 7 | 2 | 0.5% |
| 8 | 1 | 0.3% |
| 10 | 3 | 0.8% |
| 11 | 5 | |
| 12 | 6 | |
| 13 | 10 | |
| 14 | 6 | |
| 15 | 11 | |
| 16 | 5 | |
| 17 | 10 |
| Value | Count | Frequency (%) |
| 63 | 1 | 0.3% |
| 60 | 1 | 0.3% |
| 56 | 1 | 0.3% |
| 52 | 2 | 0.5% |
| 51 | 1 | 0.3% |
| 50 | 3 | |
| 49 | 3 | |
| 48 | 4 | |
| 47 | 4 | |
| 46 | 7 |
| Distinct | 184 |
|---|---|
| Distinct (%) | 47.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 156.2327366 |
| Minimum | 14 |
|---|---|
| Maximum | 846 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 14 |
|---|---|
| 5-th percentile | 42.5 |
| Q1 | 76.5 |
| median | 126 |
| Q3 | 190 |
| 95-th percentile | 397 |
| Maximum | 846 |
| Range | 832 |
| Interquartile range (IQR) | 113.5 |
Descriptive statistics
| Standard deviation | 118.9424319 |
|---|---|
| Coefficient of variation (CV) | 0.7613156788 |
| Kurtosis | 6.33576778 |
| Mean | 156.2327366 |
| Median Absolute Deviation (MAD) | 54 |
| Skewness | 2.161212146 |
| Sum | 61087 |
| Variance | 14147.30211 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 105 | 11 | 2.8% |
| 140 | 9 | 2.3% |
| 130 | 9 | 2.3% |
| 120 | 8 | 2.0% |
| 94 | 7 | 1.8% |
| 180 | 7 | 1.8% |
| 100 | 7 | 1.8% |
| 115 | 6 | 1.5% |
| 110 | 6 | 1.5% |
| 135 | 6 | 1.5% |
| Other values (174) | 315 |
| Value | Count | Frequency (%) |
| 14 | 1 | 0.3% |
| 15 | 1 | 0.3% |
| 16 | 1 | 0.3% |
| 18 | 2 | |
| 22 | 1 | 0.3% |
| 23 | 1 | 0.3% |
| 25 | 1 | 0.3% |
| 29 | 1 | 0.3% |
| 32 | 1 | 0.3% |
| 36 | 3 |
| Value | Count | Frequency (%) |
| 846 | 1 | |
| 744 | 1 | |
| 680 | 1 | |
| 600 | 1 | |
| 579 | 1 | |
| 545 | 1 | |
| 543 | 1 | |
| 540 | 1 | |
| 510 | 1 | |
| 495 | 2 |
| Distinct | 194 |
|---|---|
| Distinct (%) | 49.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 307.9565217 |
| Minimum | 24 |
|---|---|
| Maximum | 671 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 24 |
|---|---|
| 5-th percentile | 33 |
| Q1 | 262.5 |
| median | 328 |
| Q3 | 368.5 |
| 95-th percentile | 449 |
| Maximum | 671 |
| Range | 647 |
| Interquartile range (IQR) | 106 |
Descriptive statistics
| Standard deviation | 105.992238 |
|---|---|
| Coefficient of variation (CV) | 0.3441792283 |
| Kurtosis | 1.79883416 |
| Mean | 307.9565217 |
| Median Absolute Deviation (MAD) | 50 |
| Skewness | -0.9517253983 |
| Sum | 120411 |
| Variance | 11234.35452 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 333 | 7 | 1.8% |
| 32 | 7 | 1.8% |
| 316 | 6 | 1.5% |
| 336 | 5 | 1.3% |
| 287 | 5 | 1.3% |
| 259 | 5 | 1.3% |
| 345 | 5 | 1.3% |
| 355 | 5 | 1.3% |
| 252 | 5 | 1.3% |
| 394 | 5 | 1.3% |
| Other values (184) | 336 |
| Value | Count | Frequency (%) |
| 24 | 3 | |
| 25 | 1 | 0.3% |
| 26 | 2 | 0.5% |
| 28 | 1 | 0.3% |
| 29 | 2 | 0.5% |
| 30 | 3 | |
| 31 | 1 | 0.3% |
| 32 | 7 | |
| 34 | 4 | |
| 35 | 2 | 0.5% |
| Value | Count | Frequency (%) |
| 671 | 1 | |
| 594 | 1 | |
| 573 | 1 | |
| 532 | 1 | |
| 523 | 1 | |
| 497 | 1 | |
| 479 | 1 | |
| 468 | 1 | |
| 467 | 1 | |
| 465 | 1 |
DiabetesPedigreeFunction
Real number (ℝ≥0)
| Distinct | 329 |
|---|---|
| Distinct (%) | 84.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 481.1534527 |
| Minimum | 4 |
|---|---|
| Maximum | 2329 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 44.5 |
| Q1 | 245.5 |
| median | 422 |
| Q3 | 669.5 |
| 95-th percentile | 1086 |
| Maximum | 2329 |
| Range | 2325 |
| Interquartile range (IQR) | 424 |
Descriptive statistics
| Standard deviation | 340.6676582 |
|---|---|
| Coefficient of variation (CV) | 0.7080228901 |
| Kurtosis | 4.941573707 |
| Mean | 481.1534527 |
| Median Absolute Deviation (MAD) | 199 |
| Skewness | 1.596740552 |
| Sum | 188131 |
| Variance | 116054.4533 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 692 | 4 | 1.0% |
| 261 | 3 | 0.8% |
| 452 | 3 | 0.8% |
| 496 | 3 | 0.8% |
| 26 | 3 | 0.8% |
| 422 | 3 | 0.8% |
| 299 | 3 | 0.8% |
| 687 | 3 | 0.8% |
| 128 | 2 | 0.5% |
| 412 | 2 | 0.5% |
| Other values (319) | 362 |
| Value | Count | Frequency (%) |
| 4 | 1 | 0.3% |
| 6 | 1 | 0.3% |
| 14 | 1 | 0.3% |
| 15 | 2 | |
| 16 | 2 | |
| 23 | 1 | 0.3% |
| 24 | 1 | 0.3% |
| 26 | 3 | |
| 27 | 1 | 0.3% |
| 28 | 2 |
| Value | Count | Frequency (%) |
| 2329 | 1 | |
| 2288 | 1 | |
| 2137 | 1 | |
| 1699 | 1 | |
| 1391 | 1 | |
| 1353 | 1 | |
| 1321 | 1 | |
| 1318 | 1 | |
| 1292 | 1 | |
| 1268 | 1 |
| Distinct | 43 |
|---|---|
| Distinct (%) | 11.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.89002558 |
| Minimum | 21 |
|---|---|
| Maximum | 81 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.2 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 23 |
| median | 27 |
| Q3 | 36 |
| 95-th percentile | 52.5 |
| Maximum | 81 |
| Range | 60 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 10.20159252 |
|---|---|
| Coefficient of variation (CV) | 0.3302552307 |
| Kurtosis | 1.73201516 |
| Mean | 30.89002558 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 1.401759213 |
| Sum | 12078 |
| Variance | 104.07249 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=43)
| Value | Count | Frequency (%) |
| 22 | 43 | 11.0% |
| 21 | 32 | 8.2% |
| 24 | 31 | 7.9% |
| 25 | 30 | 7.7% |
| 23 | 28 | 7.2% |
| 26 | 24 | 6.1% |
| 28 | 21 | 5.4% |
| 27 | 14 | 3.6% |
| 29 | 14 | 3.6% |
| 31 | 12 | 3.1% |
| Other values (33) | 142 |
| Value | Count | Frequency (%) |
| 21 | 32 | |
| 22 | 43 | |
| 23 | 28 | |
| 24 | 31 | |
| 25 | 30 | |
| 26 | 24 | |
| 27 | 14 | 3.6% |
| 28 | 21 | |
| 29 | 14 | 3.6% |
| 30 | 10 | 2.6% |
| Value | Count | Frequency (%) |
| 81 | 1 | 0.3% |
| 63 | 1 | 0.3% |
| 61 | 1 | 0.3% |
| 60 | 2 | |
| 59 | 1 | 0.3% |
| 58 | 4 | |
| 57 | 2 | |
| 56 | 1 | 0.3% |
| 55 | 2 | |
| 54 | 2 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 391 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 261 | |
| 1 | 130 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 261 | |
| 1 | 130 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 261 | |
| 1 | 130 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 391 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 261 | |
| 1 | 130 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 391 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 261 | |
| 1 | 130 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 391 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 261 | |
| 1 | 130 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 3 | 1.0 | 89.0 | 66.0 | 23.0 | 94.0 | 281.0 | 167.0 | 21 | 0 |
| 1 | 4 | 0.0 | 137.0 | 40.0 | 35.0 | 168.0 | 431.0 | 2288.0 | 33 | 1 |
| 2 | 6 | 3.0 | 78.0 | 50.0 | 32.0 | 88.0 | 31.0 | 248.0 | 26 | 1 |
| 3 | 8 | 2.0 | 197.0 | 70.0 | 45.0 | 543.0 | 305.0 | 158.0 | 53 | 1 |
| 4 | 13 | 1.0 | 189.0 | 60.0 | 23.0 | 846.0 | 301.0 | 398.0 | 59 | 1 |
| 5 | 14 | 5.0 | 166.0 | 72.0 | 19.0 | 175.0 | 258.0 | 587.0 | 51 | 1 |
| 6 | 16 | 0.0 | 118.0 | 84.0 | 47.0 | 230.0 | 458.0 | 551.0 | 31 | 1 |
| 7 | 18 | 1.0 | 103.0 | 30.0 | 38.0 | 83.0 | 433.0 | 183.0 | 33 | 0 |
| 8 | 19 | 1.0 | 115.0 | 70.0 | 30.0 | 96.0 | 346.0 | 529.0 | 32 | 1 |
| 9 | 20 | 3.0 | 126.0 | 88.0 | 41.0 | 235.0 | 393.0 | 704.0 | 27 | 0 |
Last rows
| df_index | Pregnancies | Glucose | BloodPressure | SkinThickness | Insulin | BMI | DiabetesPedigreeFunction | Age | Outcome | |
|---|---|---|---|---|---|---|---|---|---|---|
| 381 | 744 | 13.0 | 153.0 | 88.0 | 37.0 | 140.0 | 406.0 | 1174.0 | 39 | 0 |
| 382 | 745 | 12.0 | 100.0 | 84.0 | 33.0 | 105.0 | 30.0 | 488.0 | 46 | 0 |
| 383 | 747 | 1.0 | 81.0 | 74.0 | 41.0 | 57.0 | 463.0 | 1096.0 | 32 | 0 |
| 384 | 748 | 3.0 | 187.0 | 70.0 | 22.0 | 200.0 | 364.0 | 408.0 | 36 | 1 |
| 385 | 751 | 1.0 | 121.0 | 78.0 | 39.0 | 74.0 | 39.0 | 261.0 | 28 | 0 |
| 386 | 753 | 0.0 | 181.0 | 88.0 | 44.0 | 510.0 | 433.0 | 222.0 | 26 | 1 |
| 387 | 755 | 1.0 | 128.0 | 88.0 | 39.0 | 110.0 | 365.0 | 1057.0 | 37 | 1 |
| 388 | 760 | 2.0 | 88.0 | 58.0 | 26.0 | 16.0 | 284.0 | 766.0 | 22 | 0 |
| 389 | 763 | 10.0 | 101.0 | 76.0 | 48.0 | 180.0 | 329.0 | 171.0 | 63 | 0 |
| 390 | 765 | 5.0 | 121.0 | 72.0 | 23.0 | 112.0 | 262.0 | 245.0 | 30 | 0 |